Search CORE

9 research outputs found

Improved dense trajectories for gesture recognition

Author: Pujol Torramorell Roger
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/07/2018
Field of study

En aquest projecte tractarem d'aconseguir reconèixer un conjunt de gestos prèviament establerts en temps real utilitzant l'algoritme Improved Dense Trajectories.In this project we will try to recognize a set of gestures previously established in real time using the Improved Dense Trajectories algorithm

UPCommons. Portal del coneixement obert de la UPC

Tracking coherence-related contention delays in real-time multicore systems

Author: Abella Ferrer Jaume
Cazorla Almeida Francisco Javier
Hassan Mohamed
Pujol Torramorell Roger
Tabani Hamid
Publication venue: Association for Computing Machinery (ACM)
Publication date: 01/01/2023
Field of study

The prevailing use of multicores in Embedded Critical Systems (ECS) is multi-application workloads in which independent applications run in different cores with data sharing restricted to the communication between applications and the real-time operating system. However, thread-level parallelism is increasingly used, e.g., OpenMP, in ECS to improve individual applications' performance. At the hardware level, we are witnessing increased research efforts to master and improve multicore cache coherence that plays a key role enabling efficient data sharing among threads. Despite these efforts, the limited information provided by performance monitoring counters on cache coherence limits the understanding of coherence's impact on tasks execution time and hence, poses severe constraints to estimate tight worst-case execution time bounds. In this line, this work contributes with an analysis of the impact that cache coherence can have on application timing behavior, and a new set of low-overhead performance monitoring counters that can be used to track the coherence-related contention that different threads can cause on each other when sharing data. Our results show that the proposed performance monitoring counters effectively capture all coherence-related contention that tasks can suffer and hence are key for parallel software timing validation and verification in ECS. Furthermore, they help application optimization by providing key information about data sharing among the application threads.The research leading to these results has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 772773). This work has also been partially supported by Grant PID2019-107255GB-C21 funded by MCIN/AEI/ 10.13039/501100011033.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

ADBench: benchmarking autonomous driving systems

Author: Abella Ferrer Jaume
Alcón Doganoc Miguel
Cazorla Almeida Francisco Javier
Moya Riera Joan
Pujol Torramorell Roger
Tabani Hamid
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Driven by the improvements in a variety of domains, autonomous driving is becoming a reality and today, industry aims at moving toward fully autonomous vehicles. High-tech chip manufacturers are designing high-performance and energy-efficient platforms in accordance with safety standard requirements. However, the software used to implement advanced functionalities in autonomous vehicles challenges real-time constraints on those platforms. Hence, there is a clear need for industry-level autonomous driving benchmarks to evaluate platforms and systems. In this paper, we propose ADBench, a benchmarking approach and benchmark suite for state-of-the-art autonomous driving platforms, in accordance with the key modules, structural design and functions of AD systems, building on several industry-level autonomous driving systems. The use of standard benchmarks facilitates the design, verification and validation process of autonomous systems.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under Grant TIN2015-65316-P, the SuPerCom European Research Council (ERC) project under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 772773), and the HiPEAC Network of Excellence.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Vector extensions in COTS processors to increase guaranteed performance in real-time systems

Author: Abella Ferrer Jaume
Cazorla Almeida Francisco Javier
Jorba Jorba Josep
Kosmidis Leonidas
Mezzetti Enrico
Pujol Torramorell Roger
Tabani Hamid
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 31/08/2022
Field of study

The need for increased application performance in high-integrity systems like those in avionics is on the rise as software continues to implement more complex functionalities. The prevalent computing solution for future high-integrity embedded products are multi-processors systems-on-chip (MPSoC) processors. MPSoCs include CPU multicores that enable improving performance via thread-level parallelism. MPSoCs also include generic accelerators (GPUs) and application-specific accelerators. However, the data processing approach (DPA) required to exploit each of these underlying parallel hardware blocks carries several open challenges to enable the safe deployment in high-integrity domains. The main challenges include the qualification of its associated runtime system and the difficulties in analyzing programs deploying the DPA with out-of-the-box timing analysis and code coverage tools. In this work, we perform a thorough analysis of vector extensions (VExt) in current COTS processors for high-integrity systems. We show that VExt prevent many of the challenges arising with parallel programming models and GPUs. Unlike other DPAs, VExt require no runtime support, prevent by design race conditions that might arise with parallel programming models, and have minimum impact on the software ecosystem enabling the use of existing code coverage and timing analysis tools. We develop vectorized versions of neural network kernels and show that the NVIDIA Xavier VExt provide a reasonable increase in guaranteed application performance of up to 2.7x. Our analysis contends that VExt are the DPA approach with arguably the fastest path for adoption in high-integrity systems.This work has received funding from the the European Research Council (ERC) grant agreement No. 772773 (SuPerCom) and the Spanish Ministry of Science and Innovation (AEI/10.13039/501100011033) under grants PID2019-107255GB-C21 and IJC2020-045931-I.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Improved dense trajectories for gesture recognition

Author: Pujol Torramorell Roger
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/07/2018
Field of study

Generating and Exploiting Deep Learning Variants to Increase Utilization of the Heterogeneous Resources in Autonomous Driving Platforms

Author: Pujol Torramorell Roger
Publication venue: Universitat Politècnica de Catalunya
Publication date: 24/06/2020
Field of study

Nowadays, Deep learning-based solutions and, in particular, deep neural networks (DNNs) are getting into several core functionalities in critical real-time embedded systems (CRTES), like those in planes, cars, and satellites, from vision-based perception (object detection and object tracking) systems to trajectory planning. As a result, several deep learning instances are running simultaneously at any time on the same computing platform. However, while modern computing platforms offer a variety of computing elements (e.g., CPUs, GPUs, and specific accelerators) in which those DNN instances can be executed depending on their computational requirements and temporal constraints. Currently, most DNNs are mainly programmed to exploit one particular computing element, regular cores of the GPUs. This lack of variety causes a resource imbalance and under-utilization of the various computing element resources when executing several DNN instances, causing an increase in DNN tasks' execution time requirements. In this Thesis, (a) we develop different variants (implementation) of well-known DNN libraries used in the Apollo Autonomous Driving software for each of the computing elements of the latest NVIDIA Xavier system-on-chip. Each variant is configured to balance resource requirements and performance: the regular CPU core implementation that can run on 2, 4, and 6 cores (always leaving 2 cores free for other computations); the GPU with regular and Tensor cores variants that can run on 4 or 8 GPU's Stream Multiprocessors (SM); and 1 or 2 NVIDIA's Deep Learning Accelerators (NVDLA); (b) we show that each particular variant/configuration offers different resource utilization/performance point. (c) we show how those heterogeneous computing elements can be exploited by a static scheduler to sustain the execution of multiple and diverse DNN variants on the same platform

UPCommons. Portal del coneixement obert de la UPC

Improved dense trajectories for gesture recognition

Author: Pujol Torramorell Roger
Publication venue: Universitat Politècnica de Catalunya
Publication date
Field of study

RECERCAT

A cross-layer review of deep learning frameworks to ease their optimization and reuse

Author: Abella Ferrer Jaume
Cazorla Almeida Francisco Javier
Pujol Torramorell Roger
Tabani Hamid
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

©2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Machine learning and especially Deep Learning (DL) approaches are at the heart of many domains, from computer vision and speech processing to predicting trajectories in autonomous driving and data science. Those approaches mainly build upon Neural Networks (NNs), which are compute-intensive in nature. A plethora of frameworks, libraries and platforms have been deployed for the implementation of those NNs, but end users often lack guidance on what frameworks, platforms and libraries to use to obtain the best implementation for their particular needs. This paper analyzes the DL ecosystem providing a structured view of some of the main frameworks, platforms and libraries for DL implementation. We show how those DL applications build ultimately on some form of linear algebra operations such as matrix multiplication, vector addition, dot product and the like. This analysis allows understanding how optimizations of specific linear algebra functions for specific platforms can be effectively leveraged to maximize specific targets (e.g. performance or power-efficiency) at application level reusing components across frameworks and domains.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness under grant TIN2015- 65316-P, the SuPerCom European Research Council (ERC) project under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 772773), and the HiPEAC Network of Excellence.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Empirical evidence for MPSoCs in critical systems: The case of NXP’s T2080 cache coherence

Author: Abella Ferrer Jaume
Cazorla Almeida Francisco Javier
Hassan Mohamed
Pujol Torramorell Roger
Tabani Hamid
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

The adoption of complex MPSoCs in critical real-time embedded systems mandates a detailed analysis their architecture to facilitate certification. This analysis is hindered by the lack of a thorough understanding of the MPSoC system due to the unobvious and/or insufficiently documented behavior of some key hardware features. Confidence on those features can only be regained by building specific tests to both, assess whether their behavior matches specifications and unveil their behavior when it is not fully known a priori. In this work, we introduce a systematic approach that constructs this thorough understanding of the MPSoC architecture-- and assess against its specification in processor documentation -- with a focus on the cache coherence protocol in the avionics-relevant NXP T2080 architecture as our use-case. Our approach covers all transitions in the MESI cache coherence protocol, with emphasis on the coherence between DMA and processing cores. We build evidence of their behavior based on available debug support and performance monitors. Our analysis discloses unexpected behavior for coherence-related notifications as well as some hardware monitors.This work has been partially supported by the Spanish Ministry of Science and Innovation under grant PID2019-107255GB; the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 878752 (MASTECS) and the European Research Council (ERC) grant agreement No. 772773 (SuPerCom); the HiPEAC Network of Excellence; and the Natural Sciences and Engineering Research Council of Canada (NSERC).Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC